A Corpus of Literal and Idiomatic Uses of German Infinitive-Verb Compounds

نویسندگان

  • Andrea Horbach
  • Andrea Hensler
  • Sabine Krome
  • Jakob Prange
  • Werner Scholze-Stubenrecht
  • Diana Steffen
  • Stefan Thater
  • Christian Wellner
  • Manfred Pinkal
چکیده

We present an annotation study on a representative dataset of literal and idiomatic uses of infinitive-verb compounds in German newspaper and journal texts. Infinitive-verb compounds form a challenge for writers of German, because spelling regulations are different for literal and idiomatic uses. Through the participation of expert lexicographers we were able to obtain a high-quality corpus resource which is offered as a testbed for automatic idiomaticity detection and coarse-grained word-sense disambiguation. We trained a classifier on the corpus which was able to distinguish literal and idiomatic uses with an accuracy of 85%.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Survey of Idiomatic Preposition-Noun-Verb Triples on Token Level

Most of the research on the extraction of idiomatic multiword expressions (MWEs) focused on the acquisition of MWE types. In the present work we investigate whether a text instance of a potentially idiomatic MWE is actually used idiomatically in a given context or not. Inspired by the dataset provided by (Cook et al., 2008), we manually analysed 9,700 instances of potentially idiomatic preposit...

متن کامل

How to Account for Idiomatic German Support Verb Constructions in Statistical Machine Translation

Support-verb constructions (i.e., multiword expressions combining a semantically light verb with a predicative noun) are problematic for standard statistical machine translation systems, because SMT systems cannot distinguish between literal and idiomatic uses of the verb. We work on the German to English translation direction, for which the identification of support-verb constructions is chall...

متن کامل

The VNC-Tokens Dataset

Idiomatic expressions formed from a verb and a noun in its direct object position are a productive cross-lingual class of multiword expressions, which can be used both idiomatically and as a literal combination. This paper presents the VNC-Tokens dataset, a resource of almost 3000 English verb–noun combination usages annotated as to whether they are literal or idiomatic. Previous research using...

متن کامل

German Perception Verbs: Automatic Classification of Prototypical and Multiple Non-literal Meanings

This paper presents a token-based automatic classification of German perception verbs into literal vs. multiple non-literal senses. Based on a corpus-based dataset of German perception verbs and their systematic meaning shifts, we identify one verb of each of the four perception classes optical, acoustic, olfactory, haptic, and use Decision Trees relying on syntactic and semantic corpus-based f...

متن کامل

Idiomaticity and Collocation: The Case of Phrasal Verbs

This paper is part of my thesis which aims to examine the lexical and grammatical patterns of phrasal verbs. A phrasal verb is a two-word verb which consists of a main verb and a particle, both elements contributing to its meaning. The meanings of phrasal verbs can range across different degrees of idiomaticity. They can be classified as literal, semi-figurative and idiomatic phrasal verbs in t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016